Probabilistic Deduplication of Anonymous Web Traffic
نویسندگان
چکیده
Cookies and log in-based authentication often provide incomplete data for stitching website visitors across multiple sources, necessitating probabilistic deduplication. We address this challenge by formulating the problem as a binary classification task for pairs of anonymous visitors. We compute visitor proximity vectors by converting categorical variables like IP addresses, product search keywords and URLs with very high cardinalities to continuous numeric variables using the Jaccard coefficient for each attribute. Our method achieves about 90% AUC and F-scores in identifying whether two cookies map to the same visitor, while providing insights on the relative importance of available features in Web analytics towards the deduplication process.
منابع مشابه
End - to - end data deduplication for the mobile Web : extended report
The emergence of affordable mobile devices with rich interfaces and access to high bandwidth wireless connections has revolutionized mobile Web access. However, such new trends also imply downloading larger data volumes from the Web, with considerable battery and, often, monetary costs that inevitably degrade user experience. One natural direction to tackle such a challenge is to exploit the su...
متن کاملAnonymous Web Browsing against Traffic Analysis Attacks by Reusing the Cache Memory
Anonymous web browsing is a hot topic with many potential applications for privacy reasons. However, there are few such systems which can provide high level anonymity for web browsing. The reason is the current dominant dummy packet padding method for anonymization against traffic analysis attacks. This method inherits huge delay and bandwidth waste, which inhibits its use for web browsing. In ...
متن کاملAn Infrastructure for Anonymous Internet Services
Although recent research provides many techniques for anonymous web-browsing, anonymous internet services, i.e. to run a web server or file server without revealing ones identity, have received little or no attention. In this paper, we present an approach for anonymous internet services, enabling anonymous web and file servers and to run services like anonymous instant messaging and secure shel...
متن کاملTowards Minimizing the Required Bandwidth for Mobile Web Browsing
The number of smartphones has been increasing at a very fast pace over the last several years. At this rate, it is quite likely that mobile web traffic will take a serious portion of the global Internet traffic in the very near future. However, the improvement of mobile web access technology has been known to lag behind the current growth: we face issues such as limited bandwidth and computatio...
متن کاملThesis Proposal
Privacy and anonymity are essential to society in both the physical and the electronic domain. Anonymous police tips and witness protection programs are common in the physical realm. The Internet can provide an electronic medium for free expression, but not if users can be identified and censored by totalitarian governments. There have been many attempts to provide anonymous electronic communic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015